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O ' Abstract 

(N ■ 

The literature on compressed sensing has focused almost entirely on settings where the signal 
is noiseless and the measurements are contaminated by noise. In practice, however, the signal 
' itself is often subject to random noise prior to measurement. We briefly study this setting and 

show that, for the vast majority of measurement schemes employed in compressed sensing, the 
' two models are equivalent with the important difference that the signal-to-noise ratio is divided 

by a factor proportional to p/n^ where p is the dimension of the signal and n is the number of 
observations. Since p/n is often large, this leads to noise folding which can have a severe impact 
on the SNR. 

tyj ■ Keywords: Compressed sensing, matching pursuit, sparse signals, analog noise vs. digital noise, 

O \ noise folding. 

^ ! 1 Introduction 
m 

^2 , The field of compressed sensing (CS), focused on recovery of sparse vectors from few measure- 

\ ments, has been attracting vast interest in recent years due to its potential use in numerous signal 

^ ' processing applications [6,9,10]. The standard CS setup assumes that we are given measurements 
O 

y = A-K + yv, (1) 



■ 1—^ 

X 



where y G is the measurement vector, A € M"^^ is the measurement matrix with n <^ p, and 
w S M" is additive noise. The signal x G is assumed to be s-sparse, so that no more than s 
elements of x are nonzero. It is also common to assume that x is deterministic, an assumption 
we make throughout. To recover x from y a variety of algorithms have been developed. These 
include greedy algorithms, such as thresholding and orthogonal matching pursuit (OMP) [13], and 
relaxation methods, such as basis pursuit [8] (also known as the Lasso) and the Dantzig selector [5]. 

An important aspect of CS analysis is to develop bounds on the recovery performance of these 
methods in the presence of noise. Two standard approaches to modeling the noise w is either 
to assume that w is deterministic and bounded [6], or that w is a white noise vector, typically 
Gaussian [1,2,4,5]. The former setting leads to a worst-case analysis in which an estimator must 
perform adequately even when the noise maximally damages the measurements. By contrast, if one 
assumes that the noise is random, then better performance bounds can be obtained. We therefore 
focus here on the random noise scenario. 
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Two standard measures used to analyze the behavior of CS recovery techniques are the coherence 
and the restricted isometry property (RIP) [7]. If the coherence and RIP of the measurement matrix 
are small enough, then standard recovery methods such as basis pursuit, OMP and threhsolding 
can recover x from y with squared-error that is proportional to the sparsity level s and the noise 
variance cr^, times a factor that is logarithmic in the signal length p [1,2,4,5]. 

In many practical settings, noise is introduced to the signal x prior to measurement. As an 
example, one of the applications of CS is to the design of sub-Nyquist A/D converters. In this 
context X represents the analog signal at the entrance to the A/D converter, which is typically 
contaminated by noise [12]. Though important in practice, the prolific literature on CS has not 
treated signal noise in detail. Recently, several papers raised this important issue [3,11,14]. These 
works all point out the fact that noise present in x can have a severe impact on the recovery 
performance. Here we analyze this setting in more detail in the context of the finite-CS system (1), 
in contrast to the analog setting treated in [3], and provide theoretical justification to these previous 
observations. In particular, we show that under appropriate conditions on the measurement matrix 
A, the effect of pre-measurement noise is to degrade the signal-to- noise ratio (SNR) by a factor of 
n/p. In systems in which p <^ n this degradation may be severe. 

2 Noise Folding 

2.1 Problem Formulation 

The basic CS model (1) is adequate when the noise is introduced at the measurement stage, so that 
w represents the measurement error or noise. However, in many practical scenarios, noise is added 
to the signal x to be measured, which is not accounted for in (1). A more appropriate description 
in this case is 



where z represents the signal noise, i.e., additive noise that is part of the signal being measured. 
Our goal in this letter is to analyze the effect of pre-measurement noise z on the behavior of CS 
recovery methods. We assume throughout that w is a random noise vector with covariance cr^I, 
and that similarly z is a random noise vector with covariance cJoI, independent of w. Under these 
assumptions, we show that (2) is equivalent to 



where B is a matrix whose coherence and RIP constants are very close to those of A, and u is 
white noise with variance + {p/n)aQ. 

It follows that in order to study (2) we can apply the results developed for (1), with the important 
difference that the noise associated with (2) is larger by a factor proportional to p/n. When n <^ p 
this leads to a large noise increase, or noise folding. This effect is a result of the fact that the 
measurement matrix A aliases, or combines, all the noise elements in z, even those corresponding 
to zero elements in x, thus leading to a large noise increase in the compressed measurements. 

2.2 Equivalent Formulation 

To establish our results, we note that (2) can be written as 



y = A(x + z) + w. 



(2) 



y = Bx + u. 



(3) 



y = Ax + V, 



(4) 



with V defined by 



V = w + Az. 



(5) 
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Under the assumption of white noise, the effective noise vector v has covariance Q, where 

Q = a^I + a^AA^. (6) 



As can be seen, in general v is no longer white, which complicates the recovery analysis. 

A simple special case is when AA-^ is proportional to the identity, so that v is still white 
noise. As an example, suppose that A is the concatenation of r = p/n orthonormal bases, i.e., 
A = [Ai • • • A^], where each A^, is an n x n orthogonal matrix — for example, we may want to 
analyze a signal with a few bases (e.g, wavelets and sinusoids) as in [8]. In this case, 

AA^ = Ai AT + • • • + ArA'I = rl = -I 

n 

so that the noise covariance (6) becomes Q = 7I with 

7 = cj +-cjo. (7 

n 

In this special case the models of (4) (or (2)) and (1) are identical, with the only difference being 
that the noise variance of v has increased by with respect to that of w. Assuming that 

ctq ~ (T^ the increase in noise is proportional to p/n, a simple case of noise folding. 

In the next section we show that this result holds more generally. Namely, the models of (4) 
and (1) are roughly equivalent with a noise increase of 7/0"^ even when AA^ is not proportional 
to the identity. 



3 RIP and Coherence with Whitening 

Consider now a more general CS setting, where A is an arbitrary matrix with low coherence or 

— 1/2 

low RIP. To proceed, we first whiten the noise v in (4) by multiplying the linear system by Qj^ , 
where Qi := Q/7, obtaining the equivalent system 

y = Bx + u, B := Q~^/^A,u := Q^^/V (8) 

Now, the noise vector u is white with covariance matrix 7I, just as in the case of AA-^ proportional 
to the identity. The main difference, however, is that the whitening changed the measurement 
matrix from A to B. We quantify the magnitude of these changes below via the RIP constants 
and the coherence. As we show, for standard matrices used in CS, the change is generally not very 
significant. 

Our results hinge on approximating AA-^ by (p/n) I even when A is arbitrary. Let 

77:= ||I-(n/p)AA^||2, (9) 

measure the quality of this approximation, where || • II2 denotes the standard operator norm in M". 
In our derivations below we will assume that r/ is small. Under this assumption we will show that 
the coherence and RIP constants of B and A are very similar. 

To justify the assumption that r] is often small, note that when the entries of A are i.i.d. zero- 
mean, variance 1/n random variables with a sub-gaussian distribution (including the normal, uni- 
form, Bernoulli distributions), or when the column vectors are i.i.d. uniform on the sphere, then 
r/ < C ^njp with probability at least 1 — exp(— cn), for constants C, c > depending only on the 
distribution of A [15, Th. 39]. For example, when the entries of A are i.i.d. A^(0, 1/n) and n is 
large enough, 

r? < 27^ + n/p + 4t/VP, (10) 
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with probability at least 1 — 2exp(— if < t < \fn and p > n [15, Cor. 35]. A similar result 
holds for other distributions, including heavy-tailed distributions, the basic requirement being that 
the column vectors of A are independent with covariance I/n [15]. These assumptions are standard 
in the CS literature. Thus, in these prevalent settings, r] will be small with high probability. 



3.1 Restricted Isometry Analysis 

We begin by showing that the restricted isometry constants of B and A are similar assuming a 
small value of r]. 

For an index set A C {1, . . . ,p} of size s, let Aa denote the submatrix of A made of the column 
vectors indexed by A. We say that A has the RIP with constants < < /3<j if 

as\\hf < WA^hf < I3s\\hf, Vh G M^ (11) 

for any index set A C {1, ■ ■ ■ ,p} of size s. The following proposition relates the RIP constants of 
B and A: 

Proposition 1. Assume that rj < 1/2 in (9) and that A satisfies the RIP of order s with constants 
< as < f3s- Then B satisfies the RIP of order s with constants as{l — rji) and /3<j(l + where 

m ■= v/C^-v)- 

Though the bound is valid for r] < 1, the smaller RIP constant for B is only positive when 
r] < 1/2, leading to our restriction. 

Proof. The proof is based on the fact that Qi is close to I. Indeed, by definition of r] in (9), 

2 

IIQ1-III2 = ^||AA^-(p/n)I||2 

7 

a'^ +crl{p/n) 

Next, we express Q^^ — I as a power series 

Qr^ - 1 = (I - (I - Qi))-^ - 1 = 5^(1 - Ql)^ 

fc>i 

which converges since [[Qj""'^ — I||2 < < 1 and || • II2 is an operator norm. Taking norms on both 
sides of the inequality and using both the triangle inequality and again the fact that || • II2 is an 
operator norm, we obtain 



iQr'-i|i2 < J^iiQi-i 



k 
2 

k>l 



^-^ 1 — rj 



k V 

k>l 

Let A be an index set of size s and take any h G R'^. Then, 

||BAh||2 - ||AAh||2 = h^A^Qr' - I)AAh. 

Since 

|h^AX(Qri-I)AAh| < ||Qri-I||2||AAh||2 

< r7i||AAh||2, 
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we have that 

(l-r?i)||AAhf < llBAhf < (l + 77i)||AAhf. 
Together with (11), we obtain 

as{l - < llBAhf < /3,(1 + r/i)||hf, 

which concludes the proof. □ 

3.2 Coherence Analysis 

We now turn to analyze the coherence. Denoting by Aj the ith column vector of A, the coherence 
of A is defined as 

|A^A-| 
liiA) = max ,, ' I ,, . ,, . 
i^j ||Ai|| IIAjll 

Proposition 2. Assume that r] < 3/4 in (9). Then 

MB) < (l + r/2)MA), 

where 

% := (271^ - l)-2 - 1. 
Note that r/2 = 2r/ + 0{rf'), with 772 < 5r/ when rj < 1/2. 

Proof. To prove the proposition we develop an upper bound on the numerator iBj'Bjl of /tt(B), 
and a lower bound on the denominator elements ||Bj||. For i ^ j, we have 



|BfB,| = |AfQr^A,| 



< |AfA,| + |Af(Qr^-I)A,| 

< (l + r?i)|AfA,|, 

by (12). 

We now lower bound ||Bj|| in terms of ||Aj|| and rj. In parallel with the proof of Proposition 1, 
we express (4i — i as a power series 



k>l 



where Ck are the coefficients in the Taylor expansion of (l-x)-^/2_ Taking norms on both sides of 
the inequality, we obtain 



iQr'^'-iii2 < ^cfciiQi-iii^ 

k>l 



k>l 

Therefore, 

||B,|| = ||Q7'/'Ai 



> ||Ai||-||(Q-'/'-I)A, 

> (l-?73)||Ai||, 
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where rj^ = {1 — rj) — 1. All together, we have 



|BfB,| ^ (l + r?i)|AfA,| 



||B,||||B,-|| - (l-r/3)2||Ai||||A,||' 
with (1 + ??i)/(l — r/a)^ = 1 + 7/2 by definition of r/2- D 



4 Conclusion 

Though the CS literature is almost silent on the effect of pre-measurement noise on recovery perfor- 
mance, in this letter we made the point that it may have a substantial impact on SNR. Indeed, we 
showed that, for the most common measuring schemes used in CS, the model with pre-measurement 
noise is, after whitening, equivalent to a standard model with only measurement noise, modulo a 
change in measurement matrix and an increase in the noise variance by a factor olp/n. We provided 
rigorous bounds on the RIP constants and the coherence of the new measurement matrix which 
show that, as n,p — 7- 00 withp/n — )• 0, the constants are essentially unchanged. As the performance 
of standard recovery methods are formulated in terms of the RIP constants and the coherence, this 
shows that, in this regime, these methods operate as usual, except with noise folding lead to a noise 
variance multiplied hy p/n. 
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